Image-based risk score-a prognostic predictor of survival and outcome from digital histopathology

ABSTRACT

The present invention relates to an image-based computer-aided prognosis (CAP) system and method that seeks to replicate the prognostic power of molecular assays in histopathology and pathological processes, including but not limited to cancer. Using only a tissue slide samples, a mechanism for digital slide scanning, and a computer, the present invention relates to an image-based CAP system and method which aims to overcome many of the drawbacks associated with prognostic molecular assays (e.g. Oncotype DX) including the high cost associated with the assay, limited laboratory facilities with specialized equipment, and length of time between biopsy and prognostic prediction.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority to U.S. Provisional Patent ApplicationNo. 61/149,158 filed on Feb. 2, 2009, the disclosure of which is herebyincorporated by reference in its entirety.

FIELD OF THE INVENTION

This invention concerns a computer-aided prognosis (CAP) system andmethod that employs quantitatively derived image information to predictpathological processes, disease outcome and patient survival. Whiledigital pathology has made tissue specimens amenable to computer-aideddiagnosis (CAD) for disease detection, the CAP system and method of thepresent invention predicts disease outcome and patient survival.

BACKGROUND

The current gold standard in identifying many disease states is thesubjective visual interpretation of microscopic histology of fixedtissue sections of the involved organ. Examples of this include thediagnosis of cancer as well as many inflammatory and degenerativediseases. Over the past decade, the increasing ability to assay genomicinformation led to improved classification of a variety of pathologicalprocesses using diagnostic, prognostic patterns of gene expressionand/or genomic changes. The present invention details an automated,computerized system and method for analyzing histopathology imagery thatwill produce a quantitative and reproducible metric, i.e. Image-basedRisk Score, for predicting disease outcome and patient survival.Following are two specific embodiments of the present invention that usebreast cancer as a model disease state where well-validated geneexpression-based classifiers have led to significant clinical impact.

Breast cancer (BC) is one of the leading causes of cancer-related deathsin women, with an estimated annual incidence greater than 192,000 in theUnited States in 2009 (source: American Cancer Society).

One embodiment of the present invention involves a subset of BC thatcomprises cancer cells that have not spread to the lymph nodes and withoverexpression of the estrogen receptor (LN−, ER+ BC). Although cases ofLN−, ER+ BC are treated with a combination of chemotherapy and adjuvanthormone therapy, the specific prognosis and treatment is oftendetermined by the Oncotype DX gene expression assay [1]. The Oncotype DXgene expression assay produces a Recurrence Score (RS) between 0-100that is positively correlated to the likelihood for distant recurrenceand the expected benefit from chemotherapy [1].

The manual detection of BC nuclei in histopathology is a tedious andtime-consuming process that is unfeasible in the clinical setting.Previous approaches to cell segmentation—thresholding [2], clustering[3], and active contour models [4]—are not very robust to the highlyvariable shapes and sizes of BC nuclei, as well as artifacts in thehistological fixing, staining, and digitization processes.

Previous work [1] has shown that the Oncotype DX RS is correlated withBC grade. Cancer grade reflects the architectural arrangement of thetissue and is correlated with survival (high grade implies pooroutcome). Pathologists often disagree on the grade of a BC study. Withthe recent advent of digital pathology, researchers have begun toexplore automated image analysis of BC histopathology. Wolberg et al.[6] used nuclear features from manually segmented BC nuclei todistinguish benign and malignant images. Bilgin et al. [7] explored theuse of hierarchical graphs to model the architecture of BChistopathology. Textural features were used by Hall et al. [8] toexamine variations in immunohistochemical staining.

A second embodiment of the present invention involves a subset ofinvasive BC that includes the presence of lymphocytic infiltration (LI)and exhibits amplification of the HER2 gene (HER2+ BC). Most HER2+ BC iscurrently treated with agents that specifically target the HER2 protein.Researchers have shown that the presence of LI in histopathology is aviable prognostic indicator for various cancers, including HER2+ BC[13]-[15]. The function of LI as a potential antitumor mechanism in BCwas first shown by Aaltomaa et al. [14]. More recently, Alexe et al.[15] demonstrated a correlation between the presence of high levels ofLI and tumor recurrence in early stage HER2+ BC. Pathologists do notroutinely report on the presence of LI, especially in HER2+ BC. Apossible reason for this is that pathologists currently lack theautomated image analysis tools to accurately, efficiently, andreproducibly quantify the presence and degree of LI in BChistopathology.

While some researchers [9], [16]-[21] have recently begun to developcomputer-aided diagnosis (CADx) system and methods for the analysis ofdigitized BC histopathology, they have mostly focused on either findingsuspicious regions of interest (ROI) or have attempted to determinecancer grade from manually isolated ROIs. The methods for bothapplications use image-based features to discriminate between 2 classes:either normal and benign regions or low and high grade ROIs.Specifically, the size and shape of cancer nuclei have been shown todistinguish low and high grade histology images [16], [9]. Texturalfeatures and filter banks have also been employed [16]-[19], [21] tomodel the phenotypic appearance of BC histopathology.

While several researchers have been developing algorithms for detectionof nuclei [18], [23]-[29] in digitized histopathology, there have beenno attempts to automatically detect or quantify extent of LI on BChistopathology. Some popular approaches to automated nuclear detectionare based on adaptive thresholding [18], [23] and fuzzy c-meansclustering [25], [27]. These techniques rely on differences in stainingto distinguish nuclei from surrounding tissue. However, they are notappropriate for the task of LI detection due to the similarity inappearance between BC and lymphocyte nuclei (FIG. 4( a)). Techniquessuch as active contours [24], [28], [29] have utilized gradient (edge)information to automatically isolate nuclei in histological images.These methods, however, might be limited in their ability to handlevariations in the appearance of BC nuclei (FIGS. 4( b), (c)) and imageacquisition artifacts (FIGS. 4( e), (f)). Some researchers havedeveloped hybrid techniques in order to improve nuclear detection andsegmentation results. For example, Glotsos et. al. [28] used SupportVector Machine clustering to improve initialization for active contourmodels. More recently, semi-automated probabilistic models have usedpixel-wise intensity information to detect cancer [26] and lymphocytenuclei [30] in digitized BC histopathology. Probabilistic models,however, are usually limited by the availability of expert-annotatedtraining data.

Detection of lymphocytes alone, however, cannot completely characterizethe abnormal LI phenotype because a baseline level of lymphocytes ispresent in all tissues. Gunduz et al. [20] explored automated cancerdiagnosis by using hierarchical graphs to model tissue architecture,whereby a graph is defined as a set of vertices (nuclei) withcorresponding edges connecting all nuclei.

SUMMARY OF THE INVENTION

The present invention relates to an image-based computer-aided prognosis(CAP) system that predict disease outcome and patient survival fromdigitized histopathology imagery. The present invention details anautomated, computerized system and method for analyzing histopathologyimagery that will produce a quantitative and reproducible metric, i.e.Image-based Risk Score, for predicting disease outcome and patientsurvival.

The present invention relates to an image-based computer-aided prognosis(CAP) system and method that seeks to replicate the prognostic power ofmolecular assays in histopathology and pathological processes, includingbut not limited to cancer.

In an embodiment of the present invention, an image-based computer-aidedprognosis (CAP) system and method that seeks to replicate the prognosticpower of molecular assays in cancer histopathology is developed. Thesystem and method of the present invention involves firstsemi-automatically detecting BC nuclei via an Expectation Maximizationdriven algorithm. Using the nuclear centroids, two graphs (DelaunayTriangulation and Minimum Spanning Tree) are constructed and a total of12 features are extracted from each image. A non-linear dimensionalityreduction system and method, Graph Embedding, projects the image-derivedfeatures into a low-dimensional space, and a Support Vector Machineclassifies the BC images in the reduced dimensional space. Using onlythe tissue slide samples, a mechanism for digital slide scanning, and acomputer, the image-based CAP system and method of the present inventionaims to overcome many of the drawbacks associated with Oncotype DX,including the high cost associated with the assay; limited laboratoryfacilities with specialized equipment, and length of time between biopsyand prognostic prediction.

The present invention involves key methodological contributions and theuse of several state of the art machine learning system and methods,including, but not limited to: a robust, efficient method toautomatically detect BC nuclei; image features to describe thearchitectural arrangement of BC nuclei and hence, quantitativelydescribe cancer grade, and the use of non-linear dimensionalityreduction to classify and visualize the underlying biological datastructure in a low-dimensional representation.

It is an object of the present invention to provide a semi-automatednuclear detection system and method based on the ExpectationMaximization (EM) algorithm.

In accordance with the above object, the present invention derivesarchitectural features to characterize the arrangement of BC nuclei andhence capture BC grade.

In an embodiment of the present invention, a computerized system andmethod is disclosed to automatically detect and grade the extent of LIin digitized HER2+ BC histopathology. Lymphocytes are firstautomatically detected by a combination of region growing and MarkovRandom Field algorithms. Using the centers of individual detectedlymphocytes as vertices, three graphs (Voronoi Diagram, DelaunayTriangulation, and Minimum Spanning Tree) are constructed and a total of50 image-derived features describing the arrangement of the lymphocytesare extracted from each sample. A non-linear dimensionality reductionsystem and method, Graph Embedding, is then used to project thehigh-dimensional feature vector into a reduced 3D embedding space. ASupport Vector Machine classifier is used to discriminate samples withhigh and low LI in the reduced dimensional embedding space.

An embodiment of the present invention relates to an image-based riskscore predictor method for measuring cancer extent to evaluate diseaseoutcome in cancer patients using digitized histopathology comprising: i.scanning stained histopathology slides into a computer using a highresolution whole slide scanner; ii. detecting cancer nuclei using anExpectation-Maximization based algorithm; iii. constructing the DelaunayTriangulation and Minimum Spanning Tree graphs using the centers ofindividual detected cancer nuclei as vertices; iv. extractingimage-derived features describing the arrangement of the cancer nucleifrom each image; v. projecting high-dimensional image derived featurevector into a reduced 3D embedding space via Graph Embedding; vi.unwrapping the 3D embedding into a 1D scale to define image-based riskscores for poor, intermediate, and good outcome; vii. determining animage-based recurrence score to distinguish between low, intermediate,and high cancer grades by uncovering the grade labels of the samples onthe 1D line and their relative locations to predict prognosis.

An embodiment of the present invention relates to an image-based riskscore predictor method for measuring extent of pathological processes toevaluate disease outcome in patients using digitized histopathologycomprising: i. scanning stained histopathology slides into a computerusing a high resolution whole slide scanner; ii. detecting pathologicalnuclei using an Expectation-Maximization based algorithm; iii.constructing the Delaunay Triangulation and Minimum Spanning Tree graphsusing the centers of individual detected pathological nuclei asvertices; iv, extracting image-derived features describing thearrangement of the pathological nuclei from each image; v. projectinghigh-dimensional image derived feature vector into a reduced 3Dembedding space via Graph Embedding; vi. unwrapping the 3D embeddinginto a 1D scale to define image-based risk scores for poor,intermediate, and good outcome; vii. determining an image-basedrecurrence score to distinguish between low, intermediate, and highcancer grades by uncovering the grade labels of the samples on the 1Dline and their relative locations to predict prognosis.

An embodiment of the present invention relates to an image-based riskscore predictor method for measuring cancer extent to evaluate diseaseoutcome in node-negative, estrogen receptor-positive breast cancerpatients using digitized histopathology comprising: i. scanning stainedhistopathology slides into a computer using a high resolution wholeslide scanner; ii. detecting cancer nuclei using anExpectation-Maximization based algorithm; iii. constructing the DelaunayTriangulation and Minimum Spanning Tree graphs using the centers ofindividual detected cancer nuclei as vertices; iv. extractingimage-derived features describing the arrangement of the cancer nucleifrom each image; v. projecting high-dimensional image derived featurevector into a reduced 3D embedding space via Graph Embedding; vi.unwrapping the 3D embedding into a 1D scale to define image-based riskscores for poor, intermediate, and good outcome; vii. determining animage-based recurrence score to distinguish between low, intermediate,and high cancer grades by uncovering the grade labels of the samples onthe 1D line and their relative locations to predict prognosis.

An embodiment of the present invention relates to a method for measuringextent of lymphocytic infiltration to evaluate disease outcome in breastcancer patients expressing the human epidermal growth factor receptor 2(HER2) comprising: i. scanning stained histopathology slides into acomputer using a high resolution whole slide scanner; ii. detectinglymphocyte nuclei using a combination of region-growing and MarkovRandom Field algorithms; iii. constructing a Voronoi Diagram, DelaunayTriangulation, and Minimum Spanning Tree using the centers of individualdetected lymphocyte nuclei as vertices; iv. extracting image-derivedfeatures describing the arrangement of the lymphocyte nuclei from eachimage; v. projecting high-dimensional image derived feature vector intoa reduced 3D embedding space via Graph Embedding.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1( a) depicts a LN−, ER+ BC histopathology image. FIG. 1. (b)depicts a corresponding EM-based segmentation of BC nuclei. Thesegmentation in FIG. 1( b) is FIG. 1( c) smoothed to help detectindividual nuclei. FIG. 1( d) depicts a final detected nuclear centroids(black dots) used for feature extraction.

FIG. 2( a) and FIG. 2( d) depict low and high grade LN−, ER+ BC samples.FIG. 2( b) and FIG. 2( e) depict Delaunay Triangulation. FIG. 2( c) andFIG. 2( f) depict Minimum Spanning Tree graphs overlaid.

FIG. 3 depicts Graph Embedding plots of architectural features showclear separation of different FIG. 3. FIG. 3( a) depicts BC grades. FIG.3( b) depicts RS labels. The embeddings are projected into a 1D line,where FIG. 3( c) depicts BC grade and FIG. 3( d) depicts RS defined by asingle score.

FIG. 4( a) depicts the similarity in appearance between a cancer cellnucleus and a lymphocyte nucleus. In general, lymphocyte nuclei aredistinguished from cancer cell nuclei by their smaller size, morecircular shape, and a darker, homogeneous staining. Additionalchallenges include variations in the appearance of FIG. 4( b), FIG. 4(c) depicts BC nuclei within a single histopathology slide, FIG. 4( d)depicts the presence of fat among cancerous tissue, FIG. 4( e) depictshistological fixing, and FIG. 4( f) depicts slide digitizationartifacts.

FIG. 5 depicts a flowchart illustrating the 4 main steps in the CADxsystem and method for LI-based stratification of HER2+ BChistopathology. Automated lymphocyte detection is followed by featureextraction of architectural and morphological features. Thehigh-dimensional feature space is then non-linearly embedded into areduced dimensional space via Graph Embedding, which allows for datavisualization and subsequent evaluation via a SVM classifier.

FIG. 6 depicts a flowchart illustrating the main steps in the automatedlymphocyte detection system and method.

FIG. 7 depicts a schematic illustrating the iterative growth of a regionr. After initialization of the current region SCR (as depicted in FIG.7( a)), current boundary S_(CB), and bounding box S_(BB), new pixels areadded iteratively (as depicted in FIG. 7( b)). When a new pixel(outlined in white) is added to S_(CR), the boundaries S_(CB) and S_(IB)are adjusted accordingly as depicted in FIG. 7( c)).

FIG. 8( a) and FIG. 8( e) depict luminance channels of two differentHER2+ BC histopathology studies. FIG. 8( b) and FIG. 8( f) depictcorresponding initial region-growing based lymphocyte detection. FIG. 8(c) and FIG. 8( g) depict preliminary Bayesian refinement showingdetected BC nuclei (light circles) and detected lymphocyte nuclei (darkcircles). FIG. 8( d) and FIG. 8( h) depict final lymphocyte detectionresult after the MRF pruning step.

FIG. 9 depicts probability density functions (PDFs) estimated fromempirical training data and modeled via weighted sum of gammadistributions for FIG. 9 (a), FIG. 9 (c) ω_(l) and FIG. 9 (b), FIG. 9(d) ω_(b) classes for FIG. 9 (a), FIG. 9 (b) square root of area andFIG. 9 (c), FIG. 9 (d) variance in luminance of each r∈R. In eachdistribution depicts in FIGS. 9( a)-(d), the estimated parametric modelis overlaid.

FIG. 10 depicts two different HER2+ breast cancer histopathology imageswith FIG. 10 (a) high and FIG. 10 (b) low levels of LI. FIG. 10((b) andFIG. 10 (f)) show the corresponding Voronoi Diagrams constructed usingthe automatically detected lymphocyte centers as vertices of the graph.Corresponding Delaunay Triangulation and Minimum Spanning Tree graphsare shown in FIGS. 10((c), (g)) and 10((d), (h)), respectively.

FIG. 11 depicts a histogram of the partial, directed Hausdorff distancesΦ_(H)(

^(auto),

^(man)) between automatically and manually detected lymphocyte nuclei inall 41 BC histopathology images. The dashed line denotes the median ofthe errors of the automated lymphocyte detection system and method.

FIG. 12 depicts the mean (μ_(ACC)) classification accuracy over 100trials of 3-fold cross-validation is shown for differentdimensionalities {2, 3, . . . , 10} obtained via Graph Embedding. Theerror bars represent standard deviation (σ_(ACC)) of the classificationaccuracy.

FIG. 13 depicts all 41 images plotted in the Graph Embedding reduced3-dimensional Eigen space for the architectural feature set derived fromFIG. 13( a) manual and FIG. 13( b) automated lymphocyte detection.Embeddings of the Varma-Zisserman features are shown for FIG. 13 (c) K=3and FIG. 13( d) K=5. The labels denote samples with low LI (circles),medium LI, (squares), and high LI (triangles) as determined by an expertoncologist. Note that GE with the architectural features reveals thepresence of an underlying manifold structure showing a smooth continuumof BC samples with low, medium, and high levels of LI.

DETAILED DESCRIPTION

An image analysis system that can reproducibly and quantitativelycharacterize tissue architecture can be used to predict patient outcome.

The present invention relates to an image-based computer-aided prognosis(CAP) system and method that seeks to replicate the prognostic power ofmolecular assays in histopathology and pathological processes, includingbut not limited to cancer.

The present invention involves key methodological contributions and theuse of several state of the art machine learning system and methods,including, but not limited to: a robust, efficient method toautomatically detect BC nuclei; image features to describe thearchitectural arrangement of histological structures; and the use ofnon-linear dimensionality reduction to classify and visualize theunderlying biological data structure in a low-dimensionalrepresentation.

The present invention relates to a semi-automated nuclear detectionsystem and method based on the Expectation Maximization (EM) algorithm.The present invention relates to a semi-automated nuclear detectionsystem and method based on the Expectation Maximization (EM) algorithm.Additional information regarding Expectation Maximization algorithm maybe found in: Fatakdawala, H., Basavanhally, A., Xu, J., Ganesan, S.,Feldman, M., Tomaszewski, J., Madabhushi, A., Expectation Maximizationdriven Geodesic Active Contour with Overlap Resolution (EMaGACOR):Application to Lymphocyte Segmentation on Breast Cancer Histopathology,IEEE Trans. on Biomedical Engineering, 2010 (in press)” which isincorporated by reference herein.

In the present invention, architectural features are derived tocharacterize the arrangement of cancer nuclei, including but not limitedto breast cancer nuclei. In Doyle et al. [9], different graphs wereconstructed using BC nuclei as vertices and the quantitative featuresderived from these graphs were used to successfully stratify BC grade.In the present invention, Graph Embedding (GE), a nonparametric type ofnon-linear dimensionality reduction [9], is employed to project theimage-derived features from each BC tissue specimen onto a reduced 3Dspace, and subsequently, down to a 1D line. A Support Vector Machine(SVM) classifier [10] is employed to evaluate the discriminability ofthe architectural features with respect to BC grade in the reduced 3Dspace. The further projection of the image data to a 1D line allows usto define image-based risk scores, analogous to the Oncotype DX RS, forpoor, intermediate, and good outcome. This image-based risk scorepredictor could potentially supplant Oncotype DX to predict BC outcomeand survival.

In an embodiment of the present invention, a computerized system andmethod is disclosed to automatically detect and grade the extent of LIin digitized HER2+ BC histopathology. Lymphocytes are firstautomatically detected by a combination of region growing and MarkovRandom Field algorithms. Using the centers of individual detectedlymphocytes as vertices, three graphs (Voronoi Diagram, DelaunayTriangulation, and Minimum Spanning Tree) are constructed and a total of50 image-derived features describing the arrangement of the lymphocytesare extracted from each sample. A non-linear dimensionality reductionsystem and method, Graph Embedding, is then used to project thehigh-dimensional feature vector into a reduced 3D embedding space. ASupport Vector Machine classifier is used to discriminate samples withhigh and low LI in the reduced dimensional embedding space.

An embodiment of the present invention relates to an image-based riskscore predictor method for measuring cancer extent to evaluate diseaseoutcome in cancer patients using digitized histopathology comprising: i.scanning stained histopathology slides into a computer using a highresolution whole slide scanner; ii. detecting cancer nuclei using anExpectation-Maximization based algorithm; iii. constructing the DelaunayTriangulation and Minimum Spanning Tree graphs using the centers ofindividual detected cancer nuclei as vertices; iv. extractingimage-derived features describing the arrangement of the cancer nucleifrom each image; v. projecting high-dimensional image derived featurevector into a reduced 3D embedding space via Graph Embedding; vi.unwrapping the 3D embedding into a 1D scale to define image-based riskscores for poor, intermediate, and good outcome; vii. determining animage-based recurrence score to distinguish between low, intermediate,and high cancer grades by uncovering the grade labels of the samples onthe 1D line and their relative locations to predict prognosis.

An embodiment of the present invention relates to an image-based riskscore predictor method for measuring extent of pathological processes toevaluate disease outcome in patients using digitized histopathologycomprising: i. scanning stained histopathology slides into a computerusing a high resolution whole slide scanner; ii. detecting pathologicalnuclei using an Expectation-Maximization based algorithm; iii.constructing the Delaunay Triangulation and Minimum Spanning Tree graphsusing the centers of individual detected pathological nuclei asvertices; iv. extracting image-derived features describing thearrangement of the pathological nuclei from each image; v. projectinghigh-dimensional image derived feature vector into a reduced 3Dembedding space via Graph Embedding; vi. unwrapping the 3D embeddinginto a 1D scale to define image-based risk scores for poor,intermediate, and good outcome; vii. determining an image-basedrecurrence score to distinguish between low, intermediate, and highcancer grades by uncovering the grade labels of the samples on the 1Dline and their relative locations to predict prognosis.

An embodiment of the present invention relates to an image-based riskscore predictor method for measuring cancer extent to evaluate diseaseoutcome in node-negative, estrogen receptor-positive breast cancerpatients using digitized histopathology comprising: i. scanning stainedhistopathology slides into a computer using a high resolution wholeslide scanner; detecting cancer nuclei using an Expectation-Maximizationbased algorithm; iii. constructing the Delaunay Triangulation andMinimum Spanning Tree graphs using the centers of individual detectedcancer nuclei as vertices; iv. extracting image-derived featuresdescribing the arrangement of the cancer nuclei from each image; v.projecting high-dimensional image derived feature vector into a reduced3D embedding space via Graph Embedding; vi. unwrapping the 3D embeddinginto a 1D scale to define image-based risk scores for poor,intermediate, and good outcome; vii. determining an image-basedrecurrence score to distinguish between low, intermediate, and highcancer grades by uncovering the grade labels of the samples on the 1Dline and their relative locations to predict prognosis.

An embodiment of the present invention relates to a method for measuringextent of lymphocytic infiltration to evaluate disease outcome in breastcancer patients expressing the human epidermal growth factor receptor 2(HER2) comprising: i. scanning stained histopathology slides into acomputer using a high resolution whole slide scanner; ii. detectinglymphocyte nuclei using a combination of region-growing and MarkovRandom Field algorithms; iii. constructing a Voronoi Diagram, DelaunayTriangulation, and Minimum Spanning Tree using the centers of individualdetected lymphocyte nuclei as vertices; iv. extracting image-derivedfeatures describing the arrangement of the lymphocyte nuclei from eachimage; v. projecting high-dimensional image derived feature vector intoa reduced 3D embedding space via Graph Embedding.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Exemplary Embodiment 1Automated Detection of Nuclei Using EM-Based Gaussian Mixture

The present invention relates to an image-based computer-aided prognosis(CAP) system and method that seeks to replicate the prognostic power ofmolecular assays in BC histopathology. Using only the tissue slidesamples, a mechanism for digital slide scanning, and a computer, ourimage-based CAP system and method aims to overcome many of the drawbacksassociated with Oncotype DX, including the high cost associated with theassay; limited laboratory facilities with specialized equipment, andlength of time between biopsy and prognostic prediction.

Dataset

A total of 37 hematoxylin and eosin (H & E) stained breasthistopathology images were collected from a cohort of 17 patients andscanned into a computer using a high resolution whole slide scanner at20× optical magnification. For all methods, we define the data set Z={

₁,

₂, . . .

} of

images, where an image

=(

, g) is a 2D set of pixels c∈

and g is the associated intensity function. Each

is associated with an architectural feature set F(

), an Oncotype DX RS L^(RS)(

)∈{1, 2, . . . , 100}, and BC grade L^(GR)(

)∈{LG, MG, HG}, where LG, MG, and HG represent low-, medium-, andhigh-grade cancer, respectively. The 37 samples were also classifiedbased on their RS, where L^(RS) is binned into good (RS<22),intermediate (23<RS≦30), and poor (31<RS<100) prognosis categories.

EM-Based Segmentation of Cancer Nuclei

To segment BC nuclei, each image C is modeled as a Gaussian mixture ofK=5 components, where

={1, 2, . . . , K}. We optimize the model parameter set γ^(i)={

,

,

:

}, comprising the mean

, covariance

, and a priori probability

at iteration i. The mixture is initialized to γ⁰ via K-means clusteringof RGB values over all c∈

. The Expectation step calculates the posterior probability

${{p^{i}\left( {\kappa {g(c)}} \right)} = \frac{p_{\kappa}^{i}{N\left( {{{g(c)}\mu_{\kappa}^{i}},\sigma_{\kappa}^{i}} \right)}}{\sum\limits_{j = 1}^{K}{p_{j}^{i}{N\left( {{{g(c)}\mu_{j}^{i}},\sigma_{j}^{i}} \right)}}}},$

where N(g(c)|

,

) represents the value of Gaussian component

at intensity g(c). The Maximization step uses p^(i) to calculateγ^(i+1)={

,

,

} [5]. The EM algorithm converges when ||(

^(i+1)−

^(i))/

^(i)||₂<ε, where

^(i) is the log likelihood of the Gaussian mixture model with parametersγ^(i) and ε=10⁻⁵ is determined empirically. Based on posteriorprobability, a grayscale “scene” over all c∈

(FIG. 1( b)) is saved for each

={1, 2, . . . , K}.

The scene that best represents BC nuclei (FIG. 1( b)) is selectedmanually and smoothed (FIG. 1( c)) to reduce intra-nuclear intensityvariations. Morphological and connected component operations are thenapplied to identify individual objects corresponding to BC nuclei andthe corresponding set of n nuclear centroids V={υ₁, υ₂, . . . , υ_(n)}is found for each

∈Z (FIG. 1( d)).

Feature Extraction

A complete, undirected graph

={V, E, W} is defined by a vertex-set of nuclear centroids V, anedge-set E={E₁, E₂, . . . , E_(m)} connecting the nuclear centroids suchthat (υ₁, υ₂)∈E with υ₁, υ₂∈V, and a set of weights W={W₁, W₂, . . . ,W_(m)} proportional to the length of each E∈E. A total of 12architectural features are extracted for each image based on theDelaunay Triangulation and a Minimum Spanning Tree [9] (FIG. 2). Belowwe describe the construction of these 2 graphs.

Delaunay Triangulation

A Delaunay graph

=(V,

,

) (FIG. 2( b)) is a spanning subgraph of

that is easily calculated from the Voronoi Diagram

. Each

is defined by a set of polygons P={P(υ₁), P(υ₂), . . . , P(υ_(n))}surrounding all nuclear centroids V. Each pixel c∈

is linked to the nearest v∈V (via Euclidean distance) and added to theassociated polygon P(v)∈P. The Delaunay graph

is simply the dual graph of

and is constructed such that if P(υ_(a)), P(υ_(b))∈P share a side, theirnuclear centroids υ_(a), υ_(b)∈V are connected by an edge (υ_(a),υ_(b))∈

. The mean, standard deviation, minimum/maximum (min/max) ratio, anddisorder are calculated for the side length and area of all triangles in

, providing a set of 8 features

for each

∈Z.

Minimum Spanning Tree

A spanning tree

_(S)=(V, E_(S), W_(S)) refers to any spanning subgraph of

. The total weight Ŵ_(S) for each subgraph is calculated by summing allindividual weights W∈W_(S). The Minimum Spanning Tree (FIG. 2( c))

_(MST)=arg

Ŵ_(S) is the subgraph with the lowest total weight. The mean, standarddeviation, min/max ratio, and disorder of the branch lengths in

_(MST) provide a set of 4 features f_(MST) for each

∈Z.

Dimensionality Reduction Using Graph Embedding and Support VectorMachine Based Classification Projecting Data to a 3D Space via GraphEmbedding

We use Graph Embedding (GE) to transform the architectural feature setinto a low-dimensional embedding [9]. For each

∈Z, a 12-dimensional architectural feature set is defined as thesuperset F(

)={

, f_(MST)} containing all features derived from Delaunay Triangulationand Minimum Spanning Tree. Given histopathology images

_(a),

_(b)∈Z, a confusion matrix

(a, b)=exp(−||F(

_(a))−F(

_(b))||₂)∈

is first constructed ∀a, b. The optimal embedding vector F′ is obtainedfrom the maximization of the function,

${{ɛ\left( F^{\prime} \right)} = {2{\left( {\mathcal{M} - 1} \right) \cdot {{trace}\left\lbrack \frac{{F^{\prime^{T}}\left( { - } \right)}F^{\prime}}{F^{\prime^{T}}\; F^{\prime}} \right\rbrack}}}},$

where

(a, a)=Σ_(b)

(a, b). The low-dimensional embedding space is defined by the Eigenvectors corresponding to the β smallest Eigen values of (

−

)F′=λ

F′.

Reducing the high-dimensional feature space to a 3D Eigen subspaceallows us to evaluate the discriminability of the image-derived featuresin distinguishing samples with different cancer grade patterns and hencedifferent prognoses.

SVM-Based Classification via Cross-Validation

A support vector machine (SVM) classifier [10] is trained usingimage-derived features to distinguish images with different grades usinga k-fold cross-validation system and method. The data set Z is dividedinto training Z_(tra) and testing Z_(tes) subsets, whereZ_(tra)∩Z_(tes)=. The SVM classifier projects F(Z_(tra)) onto a higherdimensional space using a linear kernel and the hyperplane that mostaccurately separates the two classes is determined. The classifier isevaluated by projecting F(Z_(tes)) and comparing all

∈Z_(tes) to the hyperplane. Image

is said to be correctly classified if its SVM-derived class matches theclinician's ground truth label.

SVM training is performed via stratified, randomized k-foldcross-validation algorithm, whereby Z is divided randomly into ksubsets. The samples from k−1 subsets are pooled into Z_(tra) and theremaining subset is used as Z_(tes). For each of the k iterations, adifferent subset is used as Z_(tes) while the remaining subsets are usedfor Z_(tra). Using a value of k=3, the entire cross-validation algorithmis repeated for 100 trials and the resulting mean μ_(ACC) and standarddeviation σ_(ACC) of the classification accuracy are determined.

Geodesic Distance-Based Projection from 3D to 1D

The 3D GE manifold can be “unwrapped” into a 1D (linear) space simply byselecting the image

₁ at one end of the manifold as an anchor point and using the Euclideandistance metric to find the next image nearest to it on the 3D manifold.By using

_(a) as the new anchor point, this process is repeated until all

∈Z are included. Thus the geodesic distances between all scenes Cembedded on the manifold are determined and GE is again employed toproject the data down to a 1D line. By uncovering the grade (outcome)labels of the samples on this 1D projection and their relativelocations, an image-based recurrence score can be determined todistinguish between low, intermediate, and high BC grades (and henceoutcomes). For any new image

_(b) projected onto this line, the relative distance of

_(b) from poor, intermediate, and good outcome samples on the trainedmanifold will enable prediction of prognosis for

_(b).

Results and Discussion Image-Based Discrimination of Grade

A SVM trained via 3-fold cross-validation on the unreduced F and reduced(3D) F′ architectural feature sets was able to distinguish high and lowgrade BC histopathology images with classification accuracies of74.82%±5.74% and 84.12%±5.42%, respectively, over 100 runs (Table 1).These results appear to confirm that GE has embedded the architecturalfeature set without any significant loss of information. The success ofthe architectural features is confirmed qualitatively by the clearseparation between high and low BC grade on the 3D manifold (FIG. 3(a)).

TABLE 1 μ_(ACC) and σ_(ACC) over 100 trials of 3-fold cross-validationfor both automatically and manually detected BC nuclei. Results arereported for the original F and low-dimensional F′ feature sets usingboth the RS (L^(RS)) and cancer grade (L^(GR)) labels. AutomatedDetection Manual Detection RS (F′) 84.15% ± 3.10% 71.92% ± 4,66% Grade(F′) 84.12% ± 5.42% 85.71% ± 4.89% RS (F) 84.56% ± 2.69% 71.65% ± 5.88%Grade (F) 74.82% ± 5.74% 85.00% ± 4.51%

Correlation Between Image-Based Signatures in Grade Discrimination andOncotype DX Recurrence Scores

Replacing the grade labels with the RS labels, the SVM trained via3-fold cross-validation on F′ and F yielded classification accuracies of84.15%±3.10% and 84.56%±2.69%, respectively (Table 1). This shows that acorrelation exists between molecular prognostic assays such as OncotypeDX and the spatial arrangement of nuclei and histological structures inBC histopathology. The 3D manifolds in FIGS. 3( a), (b) reveal a similarunderlying biological stratification that exists in BC grade andOncotype DX RS, in turn suggesting that the quantitative imageinformation employed to characterize BC grade could recapitulate theprognostic capabilities of Oncotype DX. The curvilinear 3D manifold onwhich the different BC grades (low to high) reside in a smooth continuummay potentially offer insight into BC biology as well.

Creating an Image-Based Assay Using 1D Projection

FIGS. 3( c), (d) represent the 1D projections of the 3D manifolds shownin FIGS. 3( a), (b), respectively. The manifolds reveal a smooth,continuous progression from low to medium to high levels in terms ofboth RS and histological (grade) for all LN−, ER+ BC samples considered.The similarity between the 1D manifolds (FIGS. 3( c), (d)) suggest thatour image-based CAP system can be used to generate a prognostic assay topredict survival scores in much the same way as Oncotype DX.

Exemplary Embodiment 2

The ability to automatically detect LI would be invaluable to BCpathologists and oncologists, since manual detection of individuallymphocyte nuclei in BC histopathology is a tedious and time-consumingprocess that is not feasible in the clinical setting. The availabilityof a computerized image analysis system and method for automatedquantification of LI extent in HER2+ BC will enable development of aninexpensive image-based system for predicting disease survival andoutcome.

An important prerequisite for extracting histopathological imageattributes to model BC appearance is the ability to automatically detectand segment histological structures, including nuclei and glands.Consequently the ability of an image analysis system and method to gradethe extent of LI in a BC histopathology image is dependent on thealgorithm's ability to automatically detect lymphocytes. Automatedlymphocyte detection, however, is a non-trivial task complicated by theintrinsic similarity in appearance of BC nuclei and lymphocyte nuclei onhematoxylin and eosin (H & E) stained breast biopsy samples (FIG. 4(a)). In addition, even within a particular slide, the morphology of BCnuclei is highly heterogeneous due to variations in cancer grade andmitotic phase (FIGS. 4( b), (c)) [22]. Biological differences such asthe presence of fat deposits (FIG. 4( d)) can confound algorithms thatrely on boundary detection alone. Preparation issues such as “cuttingartifact” (FIG. 4( e)) and digitization misalignment (FIG. 4( f)) leadto similar problems, but are more difficult to predict and correct forsince they are unrelated to the underlying biology.

To address some of the challenges in automated detection of lymphocytes(as illustrated in FIG. 4), the present invention provides acomputerized image analysis system and method that combines aregion-growing algorithm with Maximum a Posteriori (MAP) estimation andMarkov Random Field (MRF) theory [31]. First, all candidate BC andlymphocyte nuclei are detected via a region-growing algorithm that usescontrast measures to find optimal boundaries [31], [32]. By growingoutward from the center of each nucleus, this technique is robust toartifacts outside of the nuclei (FIGS. 4( d)-(f)). The region-growingalgorithm has high detection sensitivity, resulting in a large number oflymphocyte and non-lymphocyte nuclei being detected. MAP estimationimproves detection specificity by incorporating size and luminanceinformation from each detected object to temporarily label it as eithera BC or lymphocyte nucleus (these being the 2 main classes of objectsdetected). MRF theory [31], [33] then allows us to improve lymphocytedetection specificity by modeling the infiltration phenomenon in termsof spatial proximity, whereby an object is more likely to be labeled asa lymphocyte nucleus if it is surrounded by other lymphocyte nuclei. Theapplication of MRF is a unique step that exploits the spatial propertiesof LI to (1) distinguish nuclei that would be otherwise misclassified(FIG. 4( a)) and (2) isolate infiltrating lymphocytes from thesurrounding baseline level of lymphocytes. We achieve MAP estimation byusing the Iterated Conditional Modes algorithm, a fast and simple methodfor maximizing the posterior probability that a detected object isindeed a lymphocyte [31], [34].

The importance of using graph algorithms to quantitatively describe thespatial arrangement of nuclei in distinguishing cancer grade in bothprostate cancer and BC histopathology has been previously shown [9],[35]. In [9], quantitative features derived from graphs (VoronoiDiagram, Delaunay Triangulation, and Minimum Spanning Tree) constructedusing BC nuclei as vertices were used to successfully stratify low,intermediate, and high BC grade on digitized histopathology.

The present invention quantifies the extent of LI in HER2+ BChistopathology by extracting graph-based and nuclear features tocharacterize the architectural arrangement and morphological appearanceof lymphocytes [9]. Traditional textural signatures such as first ordergrey level features, second order Haralick statistics, and Gabor filterfeatures were not considered in the present invention because they havebeen shown to be unsuitable for CADx applications in breast and prostatecancer that rely on spatial information [9], [35].

While a large set of descriptive features is certainly desirable formodeling biological processes such as LI, a high-dimensional featurespace also presents problems for data classification analysis. First,the curse of dimensionality [36] affects computational efficiency due tothe exponential increase in data volume required for each additionalfeature. Further, it is impossible to directly visualize therelationships between images in a high-dimensional space. The presentinvention addresses both issues by utilizing Graph Embedding (GE) [9],[37], a nonparametric, non-linear dimensionality reduction system andmethod [38], to first project the image-derived features onto a reduced3D space while simultaneously preserving object-class relationships.Thus, 2 samples with low levels of LI would be projected relativelyclose to each other and relatively farther from a sample with a highlevel of LI.

A Support Vector Machine (SVM) classifier [38], [39] is then employed inthe reduced dimensional space to discriminate between images with highand low LI. We have previously shown the utility of SVM classifiers inconjunction with architectural and morphological image features todistinguish malignant and benign BC samples and also for distinguishingbetween different grades of BC [9] and prostate cancer [35]. SVM resultsfor our architectural features are also compared against results fromthe popular, unsupervised Varma-Zisserman (VZ) texton-based system andmethod.

The main components of the methodology for automated detection andstratification of LI on BC histopathology of the present invention isillustrated in the flowchart in FIG. 5.

Dataset Description and Notation

A total of 41 hematoxylin and eosin (H & E) stained breast biopsysamples from 12 patients at The Cancer Institute of New Jersey (CINJ)were obtained and scanned into a computer using a high resolution wholeslide scanner at 20× optical magnification (0.33 μm spatial resolution).The size of each image falls within 600≦U_(X)≦700 and 500≦U_(Y)≦600,where U_(X) and U_(Y) are the width and height, respectively, in pixels.These images were separated into 3 classes by a BC oncologist based onthe extent of LI. The dataset comprises 22 low, 10 medium, and 9 high LIsamples. For the purpose of quantitative classification (as described inSection VI-B), the oncologist separated the images into two classescomprising 22 low LI and 19 high LI samples, respectively.

We define a dataset Z={

₁,

₂, . . . ,

} of

images. An image scene

∈Z is defined as

=(

, g), where

is a 2D set of pixels c∈

and g is the associated luminance function from the CIE-Lab color space[41]. A list of symbols and notation commonly used in this paper isshown in Table I. The CIE-Lab color space has the advantage of beingmore perceptually uniform and more robust to variations in staining anddigitization than RGB space (FIG. 4 (a)) [20], [40].

TABLE 1 A list of key notation used in this paper. Symbol Description Z= { 

₁,  

₂, . . . ,  

 } HER2+ BC histopathology dataset comprising  

  digitized Images

  = (C, g) Image scene defined by a set of pixels (C) and luminancefunction (g) N = {n₁, n₂, . . . , n_(N)} N candidate lymphocyte nucleicenters in image scene 

R = {r₁, r₂, . . . , r_(N)} N candidate regions grown front N

  = {o₁, o₂, . . . , o_(L)} L finalized lymphocyte nuclei centers inimage scene 

, where  

  ⊂ N R Set of pixels representing lymphocyte nucleus region S*_(CR)X_(r) ε {ω_(b), ω

} Random variable denoting class BC (ω_(b)) or lymphocyte ( ω

) nucleus for each region r ε R Y_(r) = [A_(r), σ_(r)]

 ε 

⁺² Random variable denoting features square root of area (A) and std.dev. intensity (σ) for each region r ε R

Specific instances of X_(r) and Y_(r) x, y Sets of x_(r), ∀r ε R andy_(r), ∀r ε R

  = { 

, E, W} Graph with vertex-set 

, edge-set E, and weights W F( 

) Architectural feature set for image scene 

F

( 

) Low-dimensional embedding of architectural feature set for imagescene 

 ( 

) ε {+1, −1} True label for image scene 

 as determined by expert pathologist, such that +1 represents high LIand −1 represents low LI.

indicates data missing or illegible when filed

Automated Nuclear Detection of Lymphocytes

Beginning with a set of N candidate lymphocyte nuclear centers N={n₁,n₂, . . . , n_(N)}, we attempt to identify a set of L finalizedlymphocyte nuclei with centers given by

={o₁, o₂, . . . , o_(L)}, such that

⊂N. The following sections detail the region-growing, Maximum aPosteriori (MAP) estimation, and Markov Random Field (MRF) algorithmsthat comprise the lymphocyte detection module of our CADx system (FIG.6).

A. Candidate Lymphocyte Detection via Region-Growing

We first attempt to identify candidate image locations that couldrepresent centers of lymphocytic nuclei. The region-growing algorithmexploits the fact that lymphocyte nuclei in the luminance channel areidentified as continuous, circular regions of low intensitycircumscribed by sharp, well-defined boundaries (FIG. 4). The imagescene C is convolved with a Gaussian (smoothing) kernel at multiplescales σ_(G)∈{6, 7, 8} μm to account for variations in lymphocyte size.After convolution at each scale, valleys (i.e. the darkest pixels) arefound on the smoothed image based on local differences in luminance.These valleys define a set of seed points N={n₁, n₂, . . . , n_(N)} thatrepresent candidate lymphocyte centers on the original scene

. Each n∈N is grown into a corresponding region r∈R using the 4-stepprocedure described below.

Additional details on the region-growing system and method may beobtained from Reference [32].

Step 1: A set of current pixels S_(CR)={n} is initialized as shown inFIG. 7( a). The current boundary S_(CB) is defined as the set of8-connected pixels surrounding S_(CR). A square bounding box S_(BB)containing all pixels within a 12σ_(G)×12σ_(G) neighborhood around n isthen constructed.

Step 2: The pixel c∈S_(CB) with the lowest intensity in the currentboundary is identified. Pixel c is removed from S_(CB) and added toS_(CR). The current boundary S_(CB) is updated to include every pixeld∈S_(BB) that is an 8-connected neighbor of c and d∉S_(CR). A set ofinternal boundary pixels S_(IB)⊂S_(CR) (FIGS. 7( b), (c)) is defined asall pixels in S_(CR) that are 8-connected to any pixel in S_(CB).

Step 3: g _(IB) and g _(CB) are computed as the mean intensity of pixelsin S_(IB) and S_(CB), respectively. The boundary strength is computed ateach iteration as g _(IB)− g _(CB).

Step 4: Steps 2 and 3 are iterated until the current region S_(CR) triesto add a pixel outside the bounding box S_(BB). The optimal lymphocyteregion S*_(CR) is identified at the iteration for which the boundarystrength g _(IB)− g _(CB) is maximum (FIGS. 8( b), (f)).

Since the region-growing procedure is repeated with seed points from avariety of smoothing scales σ_(G)∈{6, 7, 8} μm, overlapping regions areresolved by discarding the region with the lower boundary strength. Forthe sake of convenience we will refer to S*_(CR) as R below.

B. Bayesian Modeling of Lymphocytic Infiltration via Maximum aPosteriori Estimation

The initial lymphocyte detection is refined by incorporating domainknowledge regarding lymphocyte size, luminance, and spatial proximity.Each r∈R has two associated random variables: X_(r)∈Λ≡{ω_(b), ω_(l)}indicating its classification as either a breast cancer (ω_(b)) orlymphocyte (ω_(l)) nucleus and Y_(r)≡[A_(r), σ_(r)]^(T∈)

⁺² denoting the two observed features

$\begin{matrix}{{A_{r} = \sqrt{R}},} & (1) \\{{\sigma_{r} = \sqrt{\frac{1}{R}{\sum\limits_{c \in R}\left( {{g(c)} - \overset{\_}{g}} \right)^{2}}}},} & (2)\end{matrix}$

where A_(r) is the square root of nuclear area (Equation 1), σ_(r) isthe standard deviation of luminance in the nuclear region (Equation 2),|R| is the cardinality of R, and

$\overset{\_}{g} = {\frac{1}{R}{\sum\limits_{c \in R}{g(c)}}}$

is the average pixel intensity of R. The choice of the two features(A_(r) and σ_(r)) is motivated by the fact that (a) BC nuclei aretypically larger than lymphocyte nuclei and (b) BC and lymphocyte nucleiare significantly different in terms of the homogeneity of theirluminance values. Specific instances of the random variables X_(r) andY_(r) are denoted by x_(r)∈Λ and y_(r)=[A_(r), σ_(r)]^(T)∈

⁺², respectively. We define the random variables collectively for allr∈R as X={X₁, X₂, . . . , X_(N)} and Y={Y₁, Y₂, . . . , Y_(N)} withstate spaces Ω=Λ^(N) and

^(+2×N), respectively. Instances of X and Y are denoted by variablesx=(x₁, x₂, . . . , x_(N))∈Ω and y=(y₁, y₂, . . . , y_(n))∈

^(+2×N).

The labels X=x, given the feature vectors Y=y, are estimated usingMaximum a Posteriori (MAP) estimation [42], which advocates finding thex that maximizes the posterior probability

$\begin{matrix}{{{p\left( {xy} \right)} = {\frac{{p\left( {yx} \right)}{p(x)}}{p(y)} \propto {{p\left( {yx} \right)}{p(x)}}}},} & (3)\end{matrix}$

where p(y|x) is the likelihood term and p(x), p(y) are priordistributions for x and y, respectively. Since maximization of Equation3 is only with respect to x, the prior distribution p(y) is ignored.

1) Modeling Lymphocyte Features via Trained Probability Distributions:The likelihood term p(y|x) in Equation 3 is calculated from probabilitydensity functions (PDFs), where x is provided by manual delineation oflymphocytes in a training set. Under the assumption that y isindependent and identically distributed, the likelihood term in Equation3 can be simplified such that

$\begin{matrix}{{p\left( {yx} \right)} = {\prod\limits_{r \in R}{{p\left( {y_{r}x_{r}} \right)}.}}} & (4)\end{matrix}$

Each 2-dimensional probability density function (PDF) is modeled as theproduct of two independent distributions: p(y_(r)|x_(r))=

(A_(r)|x_(r))

(σ_(r)|x_(r)). Thus we require four one-dimensional PDFs

(A_(r)|ω_(b)),

(A_(r)|ω_(l)),

(σ_(r)|ω_(b)), and

(σ_(r)|ω_(l)) as shown in FIG. 9. To reduce local irregularities andcreate a smooth, continuous distribution, the one-dimensional PDFs aremodeled by mixtures of Gamma distributions [42]

$\begin{matrix}{{{\overset{\_}{\Gamma}\left( {{z;\delta},\varphi,t} \right)} = {{\delta \; z^{t_{1} - 1}\frac{^{{- z}/\varphi_{1}}}{\varphi_{1}^{t_{1}}{\Gamma \left( t_{1} \right)}}} + {\left( {1 - \delta} \right)z^{t_{2} - 1}\frac{^{{- z}/\varphi_{2}}}{\varphi_{2}^{t_{2}}{\Gamma \left( t_{2} \right)}}}}},} & (5)\end{matrix}$

where z∈

⁺, δ∈[0, 1] is the mixing parameter, t₁, t₂>0 are the shape parameters,φ₁, φ₂>0 are the scale parameters, and Γ is the Gamma function [42].Calculating p(y|x) allows us to estimate Equation 3 and assign atentative class x_(r)∈{ω_(b), ω_(l)} to each r∈R (FIGS. 8( c), (g)).

2) Modeling Lymphocyte Proximity via Markov Random Fields: The priordistribution p(x) (Equation 3) is defined by a Markov Random Field(MRF). The Markov property [33] states that

p(x _(r) |x _(r))=p(x _(r) |x _(η) _(r) ),   (6)

where the neighborhood η_(r) is empirically assumed to contain allregions within a 30 μm radius of r, x_(−r)={x_(s):s ∈R, s≠r}, and x_(η)_(r) ={x_(s):s∈η_(r)}. We use the Iterated Conditional Modes (ICM)algorithm [34], a deterministic relaxation procedure, to perform MAPestimation (Equation 3) and assign a hard label x_(r)∈{ω_(b), ω_(l)} toeach r∈R. Thus each region is classified as either a BC or lymphocytenucleus (FIGS. 8( d), (h)). The regions labeled as BC nuclei arediscarded, while centers of the L lymphocyte nuclei regions are computedand stored as

={o₁, o₂, . . . , o_(L)}.

Feature Extraction

We define the complete, undirected graph

=(

, E, W), where

={o₁, o₂, . . . , o_(L)} is the set of vertices corresponding to the setof lymphocyte nuclear centroids, E={E₁, E₂, . . . , E_(m)} is the set ofedges connecting the nuclear centroids such that {(o_(i),o_(j))∈E:∀o_(i), o_(j)∈

, i, j∈{1, 2, . . . , L}, i≠j}, and W={W₁, W₂, . . . , W_(m)} is a setof weights proportional to the length of each E∈E. To extractinformation about the arrangement of lymphocyte nuclei, we constructsubgraphs representing the Voronoi Diagram

_(V), Delaunay Triangulation

, and Minimum Spanning Tree

_(MST) (FIG. 10). In addition, statistics describing the number anddensity of nuclei are calculated directly from

.

A. Voronoi Diagram

The Voronoi graph

_(V)=(

, E_(V), W_(V)) (FIGS. 10( b), (f)) is a spanning subgraph of

defined as a set of polygons P={P₁, P₂, . . . , P_(L)} surrounding allnuclear centroids

[43]. Each pixel c∈

is linked with the nearest centroid o∈

(via Euclidean distance) and added to the associated polygon P∈P. Themean, standard deviation, minimum/maximum (min/max) ratio, and disorder(i.e. standard deviation divided by the mean) are calculated for thearea, perimeter length, and chord length over all P, yielding a set of13 features (f_(V)) for each scene

(Table II).

B. Delaunay Triangulation

The Delaunay graph

=(

,

,

) (FIGS. 10( c), (g)) is a spanning subgraph of

and the dual graph of

_(V) [43]. It is constructed such that if P_(i), P_(j)∈P share a side,where i,j∈{1, 2, . . . , L}, their nuclear centroids o_(i), o_(j)∈

are connected by an edge (o_(i), o_(j))∈

. The mean, standard deviation, min/max ratio, and disorder arecalculated for the side length and area of all triangles in

, yielding a set of 8 features (

) for each scene

(Table II).

C. Minimum Spanning Tree

A spanning tree

=(

, E_(S), W_(S)) refers to any spanning subgraph of

[43]. The total weight Ŵ_(S) for each subgraph is determined by summingall individual weights W∈W_(S). The Minimum Spanning Tree

_(MST) (FIGS. 10( d), (h)) is the spanning tree with the lowest totalweight such that

_(MST)=arg

[Ŵ_(S)]. The mean, standard deviation, min/max ratio, and disorder ofthe branch lengths in

_(MST) yield a set of 4 features (f_(MST)) for each scene

(Table II).

D. Nuclear Features

The global density

$\frac{L}{C}$

of lymphocyte nuclei is calculated for each scene

where L is the total number of detected lymphocytes and |

| represents the number of pixels (cardinality) in

. For any nuclear centroid o_(i)∈

, we define a corresponding nuclear neighborhoodη^(ζ)(o_(i))={o_(j):||o_(i)−o_(j)||₂<ζ, o_(j)∈

, o_(j)≠o_(i)}, where ζ∈{10, 20, . . . , 50} and ||·||₂ is the L2 norm.The mean, standard deviation, and disorder of η^(ζ)(o_(i)), ∀o_(i)∈

are calculated. Additionally we estimate the minimum radius ζ* such that|η^(ζ*) (o_(i))|∈{3, 5, 7} and calculate the mean, standard deviation,and disorder over all o_(i)∈

. A total of 25 nuclear features (f_(NF)) are extracted for each scene

(Table II).

TABLE 2 A breakdown of the 50 architectural features, comprising 25graph-based and 25 nuclear attributes. Feature Set Description No. offeatures f_(V) Total area of all polygons 13 Polygon area: mean, stddev., min/max ratio, disorder Polygon perimeter: mean, std dev., min/maxratio, disorder Polygon chord length: mean, std dev., min/max ratio,disorder f_(D) Triangle side length: mean, std dev.,  8 min/max ratio,disorder Triangle area: mean, std dev., min/max ratio, disorder f_(MST)Edge length: mean, std dev.,  4 min/max ratio, disorder f_(NF) Densityof nuclei 25 Distance to {3, 5, 7} nearest nuclei: mean, std dev.,disorder Nuclei in ζ ε {10, 20, . . . , 50} pixel radius: mean, stddev., disorder

Non-Linear Dimensionality Reduction via Graph Embedding

Graph Embedding (GE) is employed to non-linearly transform thehigh-dimensional set of image features into a low-dimensional embeddingwhile preserving relative distances between images from the originalfeature space [9]. For each scene

, a 50-dimensional image feature set is defined as the supersetF={f_(V),

, f_(MST), f_(NF)} containing all features derived from the VoronoiDiagram, Delaunay Triangulation, Minimum Spanning Tree, and nuclearstatistics. Given histopathology images

_(a) and

_(b) with corresponding image feature sets F(

_(a)) and F(

_(b)), where a, b∈{1, 2, . . . ,

}, a

×

confusion matrix

_(F)(a, b)=exp(−||F(

_(a))−F(

_(b))||₂)∈

is constructed. The optimal embedding vector F′ is obtained from themaximization of the following function,

$\begin{matrix}{{{ɛ\left( F^{\prime} \right)} = {2{\left( {\mathcal{M} - 1} \right) \cdot {{trace}\left\lbrack \frac{{F^{\prime^{T}}\left( { - _{F}} \right)}F^{\prime}}{F^{\prime^{T}}\; F^{\prime}} \right\rbrack}}}},} & (7)\end{matrix}$

where

is a diagonal matrix defined ∀a∈{1, 2, . . . ,

} as

(a, a)=Σ_(b)

_(F)(a, b). The lower-dimensional embedding space is defined by theEigen vectors corresponding to the β smallest Eigen values of (

−

_(F))F′=λ

F′. In this work an empirically-determined value of β=3 was used. Thematrix F′(Z)∈

of the first β Eigen vectors is constructed such that F′(Z)={F′(

₁), F′(

₂), . . . , F′(

)}.

Evaluation Methods A. Hausdorff Distance for Evaluation of AutomatedLymphocyte Detection System and Method

The automated lymphocyte detection system and method is evaluated by theHausdorff distance [44], a similarity measure used to compare thefidelity of automated detection against the “gold standard” obtained bymanual inspection. For each image scene

, lymphocyte centroids from the automated (υ∈

^(auto)) and manual (u∈

^(man)) detection system and methods are identified. The centroidlocations in

^(man) were estimated exhaustively by an expert pathologist who manuallylabeled the individual lymphocytes in each scene. The partial, directedHausdorff distance is calculated for

^(auto) with respect to

^(man) as,

$\begin{matrix}{{{\Phi_{H}\left( {^{auto},^{man}} \right)} = {\min\limits_{u \in ^{man}}{{\upsilon - u}}_{2}}},{\forall{\upsilon \in {^{auto}.}}}} & (8)\end{matrix}$

B. Cross-Validation Using Support Vector Machine Classifier forQuantitative Evaluation of Architectural Features

The SVM classifier [39] is employed to evaluate the ability of the imagedescriptors to discriminate between high and low levels of LI in HER2+BC histopathology images. We construct the SVM classifier by using aGaussian kernel function Π to project training data Z_(tra)⊂Z onto ahigher-dimensional space. This high-dimensional representation allowsthe SVM to construct a hyperplane to separate the two classes (i.e. highand low LI). The classifier is then evaluated by projecting testing dataZ_(tes)⊂Z into the same space and recording the locations of the newlyembedded samples with respect to the hyperplane.

Given BC histopathology images

_(a),

_(b)∈Z_(tra) with corresponding low-dimensional embedding vectors F′(

_(a)) and F′(

_(b)), a, b∈{1, 2, . . . ,

}, respectively, the Gaussian kernel Π(F′(

_(a)), F′(

_(b)))=exp(−ε(||F′(

_(a)−F′(

_(b))||₂)²), where ε is a scaling factor that normalizes F′(

_(a)) and F′(

_(b)), is used to project the data into the high-dimensional SVM space[27]. The general form of the SVM is given as,

$\begin{matrix}{{{\Theta \left( _{a} \right)} = {{\sum\limits_{y = 1}^{\tau}{ɛ_{\gamma}{\left( _{\gamma} \right)}{\Pi \left( {{F^{\prime}\left( _{a} \right)},{F^{\prime}\left( _{\gamma} \right)}} \right)}}} + b}},} & (9)\end{matrix}$

where γ∈{1, 2, . . . , τ} represents the τ marginal training samples(i.e. support vectors), b is the hyperplane bias estimated for Z_(tra),and ξ_(γ) is the model parameter determined by maximizing the objectivefunction [38], [39]. The true image label

(

_(b))∈{+1, −1} represents a high or low level of LI as determined by anexpert pathologist. The output of the SVM classifier, Θ(

_(a)) represents the distance from image scene

_(a) to the hyperplane. A testing image scene

_(a)∈Z_(tes) is determined to be classified correctly if

(

_(a))=sign [Θ(

_(a))].

The Gaussian kernel has recently become popular for classification in anumber of biomedical image processing applications [26], [45]. In anembodiment of the present invention, Gaussian kernel is used instead ofthe traditional linear kernel [38] because its non-linear projectioncreates additional separation between the data points in thehigh-dimensional SVM space and hence, simplifies the classificationtask.

One problem with the SVM classifier is that it is susceptible to biasfrom the arbitrary selection of training and testing samples [41]. Ak-fold cross-validation system and method [41] is used to mitigate thisbias by selecting training samples in a randomized manner and runningthe SVM classifier multiple times. First Z is divided randomly into ksubsets, while ensuring that images from each class

∈{+1, −1}, are proportionally represented in each of the k subsets. Inan embodiment of the present invention, binary classification isachieved by aggregating the samples labeled as medium LI with the low LIimages. Hence the goal of the SVM classifier was to distinguish 22 highand 19 low LI samples. All samples from k−1 subsets are pooled togetherto obtain Z_(tra) and the remaining subset is used as Z_(tes). For eachof the k iterations, an SVM classifier is trained with Z_(tra) andevaluated on Z_(tes); a new Z_(tes) and Z_(tra) being chosen at eachiteration so that all samples are evaluated. Using a value of k=3, theentire cross-validation algorithm was repeated over 100 trials and theresulting mean (μ_(ACC)) and standard deviation (σ_(ACC)) of theclassification accuracy obtained. Classification accuracy is defined asthe ratio between the number of correctly classified images and thetotal number of images in the dataset.

C. Formulation of Varma-Zisserman Texton-Based Classifier

The performance of the architectural feature set was compared againstthe Varma-Zisserman (VZ) texton-based features [46] for distinguishingbetween the 41 low and high LI images in Z. Textons have previously beenshown to be useful in applications related to content-based imageretrieval [48] and computer-aided classification [49] for digitizedcancer histopathology.

Step 1: All

_(tra)∈Z_(tra) are first convolved with the Maximum Response 8 (MR8)filter bank [46], which contains edge and bar filters at severalorientations and scales. An 8-dimensional MR8 feature vector f_(text)(c)is defined for each c∈

, ∀

_(tra)∈Z_(tra).

Step 2: Feature vectors f_(text) of all c∈

∀

_(tra)∈Z_(tra) are clustered using the K-means algorithm [41] and the Kcluster centers {c*₁, c*₂, . . . , c*_(K)} are defined as textons.

Step 3: For each c∈

_(tra), the closest corresponding texton c*_(j), j∈{1, 2, . . . , K} isidentified based on arg min_(j) ||f_(text)(c)−f_(text)(c*_(j))||₂. Atexton histogram is constructed for each

_(tra)∈Z_(tra) as

(

_(tra))=(H, h) where H is a 1D grid of K bins and h(j) represents thenumber of c∈

_(tra) identified as being closer to c*_(j) than any other texton.

Step 4: For each novel image scene

_(tes)∈Z_(tes), a corresponding texton histogram

(

_(tes)) is computed. The training image scene

*_(tra)∈Z_(tra) that is most similar to

_(tes) is identified based on

$\begin{matrix}{{_{tra}^{*} = {\underset{_{tra} \in Z_{tra}}{argmin}\left\lbrack {\chi^{2}\left( {{\mathcal{H}\left( _{tra} \right)},{\mathcal{H}\left( _{tes} \right)}} \right)} \right\rbrack}},} & (10)\end{matrix}$

where

²(

(

_(tra)),

(

_(tes))) is the Chi-squared distance [49] between the histograms of

_(tra) and

_(tes). If

(

_(tes))=

(

*_(tra)),

_(tes) is said to have been correctly classified; otherwise incorrectlyclassified. Additional details on the VZ texton approach may be found inreference [46].

The mean μ_(ACC) and standard deviation τ_(ACC) of the classificationaccuracy of the VZ-texton approach are calculated over 100 randomized3-fold cross-validation trials (Table III). These experiments wererepeated for K∈{2, 3, 5, 10}.

Results and Discussion A. Performance of Automated Lymphocyte DetectionSystem and Method

Over a total of |

^(auto)|=42,000 automatically detected lymphocyte nuclei for all

∈Z, the median partial Hausdorff distance was determined to be 3.70 μm(FIG. 11). Considering an average lymphocyte diameter of approximately 7μm, these results verify the ability of our detection system and methodto accurately detect lymphocytic infiltration in HER2+ BC histopathologyimagery.

B. Performance of Graph-Based and Nuclear Features

Table III shows the classification accuracies of the reduced feature setF′(Z) resulting from both automated and manual lymphocyte detection viathe SVM classifier. Note that the classification accuracies andvariances obtained from the automated detection (90.41%±2.97%) andmanual detection (94.59%±1.72%) system and methods are comparable,reflecting the efficacy of our automated detection system and method.Table III also reveals that the unreduced architectural features F(Z)(via automated lymphocyte detection) achieve a classification accuracyof 89.71%±2.83%, suggesting in turn that GE does not lead to anysignificant loss in class discriminatory information.

TABLE 3 Results of SVM classification accuracy (μ_(ACC), σ_(ACC)) for 41BC histopathology images using 100 3-fold cross-validation trials forautomated and manual lymphocyte detection with the architectural (bothreduced F′ and unreduced F) and VZ image features. Feature SetClassification Accuracy (%) F′ (Z) (automated detection) 90.41 ± 2.97 F′(Z) (manual detection) 94.59 ± 1.72 F(Z) (automated detection) 89.71 ±2.83 F(Z) (manual detection) 99.59 ± 0.92 VZ (K = 2)  48.17 ± 6.08 VZ (K= 3)  60.20 ± 5.66 VZ (K = 5)  58.63 ± 7.17 VZ (K = 10) 56.17 ± 7.63

In order to determine the optimal dimensionality for performingclassification, the architectural feature set F(Z) was reduced tovarious dimensionalities {2, 3, . . . , 10} via Graph Embedding. Foreach dimensionality, the corresponding μ_(ACC) and error bars (σ_(ACC))over 100 trials of randomized 3-fold cross-validation were calculated(FIG. 12). FIG. 12 suggests that classification accuracy is stable atlower dimensionalities and drops off slightly at higherdimensionalities.

C. Performance of Varma-Zisserman Features

The classification results (Table III) show that the Varma-Zisserman(VZ) textural features did not perform as well as the architecturalfeatures in distinguishing between BC histopathology samples with highand low levels of LI, with a maximum classification accuracy of60.20%±5.66%. This result suggests that texture descriptors are unableto quantitatively describe phenotypic changes due to variation in LIextent. Furthermore, both natural variations in histology andimperfections arising from slide preparation (FIG. 4) may have adverselyaffected the performance of textural features, since the dataset was notscreened to exclude such samples. Conversely, architectural featuresremain unaffected by these issues because they exploit intrinsicproperties such as lymphocyte size, shape, intensity, and arrangement toclassify the BC histopathology images.

D. Low-Dimensional Manifold Visualization

Apart from helping to deal with the curse of dimensionality problem forclassification, another important application of GE is in its ability tohelp visualize the underlying structure of the data. FIG. 13 shows thereduced dimensional representation (3 dimensions) of the highdimensional architectural and VZ-texture feature spaces. Note that the 3axes in each of FIGS. 13( a)-(d) reflect the principal Eigen vectorsobtained embedding the data via GE. Reducing the architectural featureset to 3 dimensions via Graph Embedding reveals the progression from lowto medium to high degrees of LI on a smooth, continuous manifold (FIGS.13( a), (b)). Conversely, the VZ features (FIGS. 13( c), (d)) neitherproduce a continuous manifold, nor appear to stratify samples based onLI extent. The plots in FIG. 13 further validate the quantitativeclassification results shown in Table III and reflect the efficacy ofarchitectural image features in stratifying extent of LI.

An embodiment of the present invention provides a quantitative CADxsystem for detecting and stratifying the extent of LI in digitized HER2+BC histopathology images. In one embodiment of the present invention, asystem and method is provided to automatically detect and grade theextent of LI using architectural features. In one embodiment of thepresent invention, non-linearly reducing the high-dimensionalarchitectural image feature space reveals the presence of a smooth,continuous manifold on which BC samples with progressively increasing LIare arranged in a continuum. The region-growing algorithm and subsequentMRF-based refinement of the present invention provides for isolation ofLI from the surrounding BC nuclei, stroma, and baseline level oflymphocytes. The architectural (graph-based and nuclear) features, whichexploit differences in arrangement of LI, were found to be moresuccessful than textural (VZ) features in distinguishing LI extent.While applying Graph Embedding to the high-dimensional feature space didnot adversely affect the classification accuracy of the SVM classifier,in conjunction with the architectural and morphological features it didallow for the visualization of a smooth data manifold. A similarmanifold was not reproducible with the VZ features, reflecting that thearchitectural and morphological features accurately capturedclass-discriminatory information regarding the spatial extent of LI. TheLI classification results were comparable for automated and manualdetection, reflecting the robustness of our detection algorithm. Theability of the image analysis algorithm of the present invention tostratify the extent of LI into low, medium, and high grades providessignificant translational and prognostic significance, and could bedeveloped into a prognostic test for predicting disease survival andpatient outcome. Furthermore, the methods comprising the CADx system andmethod of the present invention may be employed to characterize LIextent in other tissues and diseases.

The embodiments discussed above are exemplary and are not intended tolimit the scope of the present invention. The present invention relatesto an image-based computer-aided prognosis (CAP) system and method thatseeks to replicate the prognostic power of molecular assays inhistopathology and pathological processes, including but not limited tocancer.

REFERENCES

The disclosures of all patent and non-patent literature cited in thisapplication are hereby incorporated by reference in their entireties.

-   [01] M. B. Flanagan, D. J. Dabbs, et al., “Histopathologic variables    predict oncotype dx recurrence score.,” Mod Pathol, vol. 21, no. 10,    pp. 1255-1261, October 2008.-   [02] V. R. Korde, H. Bartels, et al., “Automatic segmentation of    cell nuclei in bladder and skin tissue for karyometric analysis,” in    Biophotonics. Proceedings of the SPIE., 2007, vol. 6633.-   [03] L. Latson et al., “Automated cell nuclear segmentation in color    images of hematoxylin and eosin-stained breast biopsy.,” Anal Quant    Cytol Histol, vol. 25, no. 6, pp. 321-331, December 2003.-   [04] X. Xie et al., “Mac: Magnetostatic active contour model,” IEEE    Trans on PAMI, vol. 30, no. 4, pp. 632-646, 2008.-   [05] A. Ramme, N. Devries, et al., “Semi-automated phalanx bone    segmentation using the expectation maximization algorithm.,” J Digit    Imaging, September 2008.-   [06] W. H. Wolberg, W. N. Street, et al., “Computer-derived nuclear    features distinguish malignant from benign breast cytology.,” Hum    Pathol, vol. 26, no. 7, pp. 792-796, July 1995.-   [07] C. Bilgin et al., “Cell-graph mining for breast tissue modeling    and classification.,” in IEEE EMBS, 2007, pp. 5311-5314.-   [08] B Hall, W Chen, et al., “A clinically motivated 2-fold    framework for quantifying and classifying immunohistochemically    stained specimens,” in MICCAI, 2007, pp. 287-294.-   [09] S. Doyle, S. Agner, A. Madabhushi, M. Feldman, and J.    Tomaszewski, “Automated grading of breast cancer histopathology    using spectral clustering with textural and architectural image    features,” in Proc. 5^(th) IEEE International Symposium on    Biomedical Imaging: From Nano to Macro, 2008, pp. 496-499.-   [10] C. Cortes and V. Vapnik, “Support-vector networks,” Machine    Learning, vol. 20, no. 3, pp. 273-297, 1995.-   [11] Jemal, R. Siegel, E. Ward, Y. Hao, J. Xu, T. Murray, and M. J.    Thun, “Cancer statistics, 2008.” Calif. Cancer J Clin, vol. 58, no.    2, pp. 71-96, 2008.-   [12] F. Bertucci and D. Birnbaum, “Reasons for breast cancer    heterogeneity.” J Biol, vol. 7, no. 2, p. 6, 2008.-   [13] J. R. van Nagell, E. S. Donaldson, E. G. Wood, and J. C.    Parker, “The significance of vascular invasion and lymphocytic    infiltration in invasive cervical cancer.” Cancer, vol. 41, no. 1,    pp. 228-234, January 1978.-   [14] S. Aaltomaa, P. Lipponen, M. Eskelinen, V. M. Kosma, S.    Marin, E. Alhava, and K. Syrjanen, “Lymphocyte infiltrates as a    prognostic variable in female breast cancer.” Eur J Cancer, vol.    28A, no. 4-5, pp. 859-864, 1992.-   [15] G. Alexe, G. S. Dalgin, D. Scanfeld, P. Tamayo, J. P.    Mesirov, C. DeLisi, L. Harris, N. Barnard, M. Martel, A. J.    Levine, S. Ganesan, and G. Bhanot, “High expression of    lymphocyte-associated genes in node-negative her2+ breast cancers    correlates with lower recurrence rates.” Cancer Res, vol. 67, no.    22, pp. 10 669-10 676, November 2007.-   [16] W. H. Wolberg, W. N. Street, D. M. Heisey, and O. L.    Mangasarian, “Computer-derived nuclear features distinguish    malignant from benign breast cytology.” Hum Pathol, vol. 26, no. 7,    pp. 792-796, July 1995.-   [17] B. Weyn, G. van de Wouwer, A. van Daele, P. Scheunders, D. van    Dyck, E. van Marck, and W. Jacob, “Automated breast tumor diagnosis    and grading based on wavelet chromatin texture description.”    Cytometry, vol. 33, no. 1, pp. 32-40, September 1998.-   [18] S. Petushi, F. U. Garcia, M. M. Haber, C. Katsinis, and A.    Tozeren, “Large-scale computations on histology images reveal    grade-differentiating parameters for breast cancer.” BMC Med    Imaging, vol. 6, p. 14, 2006.-   [19] B. Karacali and A. Tozeren, “Automated detection of regions of    interest for tissue microarray experiments: an image texture    analysis.” BMC Med Imaging, vol. 7, p. 2, 2007.-   [20] C. Gunduz, B. Yener, and S. H. Gultekin, “The cell graphs of    cancer.” Bioinformatics, vol. 20 Suppl 1, pp. i145-i151, August    2004.-   [21] B. H. Hall, M. Ianosi-Irimie, P. Javidian, W. Chen, S. Ganesan,    and D. J. Foran, “Computer-assisted assessment of the human    epidermal growth factor receptor 2 immunohistochemical assay in    imaged histologic sections using a membrane isolation algorithm and    quantitative analysis of positive controls.” BMC Med Imaging, vol.    8, p. 11, 2008.-   [22] S. J. Shin, E. Hyjek, E. Early, and D. M. Knowles,    “Intratumoral heterogeneity of her-2/neu in invasive mammary    carcinomas using fluorescence in-situ hybridization and tissue    microarray.” Int J Surg Pathol, vol. 14, no. 4, pp. 279-284, October    2006.-   [23] F. Schnorrenberg, C. Pattichis, K. Kyriacou, and C. Schizas,    “Computeraided detection of breast cancer nuclei,” IEEE Trans. on    Information Technology in Biomedicine, vol. 1, no. 2, pp. 128-140,    1997.-   [24] P. Bamford and B. Lovell, “Unsupervised cell nucleus    segmentation with active contours,” Signal Processing, vol. 71, no.    2, pp. 203-213, 1998.-   [25] L. Latson, B. Sebek, and K. A. Powell, “Automated cell nuclear    segmentation in color images of hematoxylin and eosin-stained breast    biopsy.” Anal Quant Cytol Histol, vol. 25, no. 6, pp. 321-331,    December 2003.-   [26] S. Naik, S. Doyle, S. Agner, A. Madabhushi, M. Feldman, and J.    Tomaszewski, “Automated gland and nuclei segmentation for grading of    prostate and breast cancer histopathology,” in Proc. 5th IEEE    International Symposium on Biomedical Imaging: From Nano to Macro    ISBI 2008, 2008, pp. 284-287.-   [27] W. H. Land, D. W. McKee, T. Zhukov, D. Song, and W. Qian, “A    kernelised fuzzy-support vector machine cad system for the diagnosis    of lung cancer from tissue images,” International Journal of    Functional Informatics and Personalised Medicine, vol. 1, pp.    26-52(27), 2008.-   [28] D. Glotsos, P. Spyridonos, D. Cavouras, P. Ravazoula, P.-A.    Dadioti, and G. Nikiforidis, “Automated segmentation of routinely    hematoxylin-eosin-stained microscopic images by combining support    vector machine clustering and active contour models.” Anal Quant    Cytol Histol, vol. 26, no. 6, pp. 331-340, December 2004.-   [29] H. Fatakdawala, A. Basavanhally, J. Xu, G. Bhanot, S.    Ganesan, M. Feldman, J. Tomaszewski, and A. Madabhushi, “Expectation    maximization driven geodesic active contour: Application to    lymphocyte segmentation on digitized breast cancer histopathology,”    in IEEE Conference on Bioinformatics and Bioengineering (BIBE),    2009.-   [30] Basavanhally, S. Agner, G. Alexe, G. Bhanot, S. Ganesan, and A.    Madabhushi, “Manifold learning with graph-based features for    identifying extent of lymphocytic infiltration from high grade,    her2+ breast cancer histology,” in Workshop on Microscopic Image    Analysis with Applications in Biology (in conjunction with    MICCAI), 2008. [Online]. Available:    http://www.miaab.org/miaab-2008-papers/27-miaab-2008-paper-21.pdf-   [31] J. P. Monaco, J. E. Tomaszewski, M. D. Feldman, M. Moradi, P.    Mousavi, A. Boag, C. Davidson, P. Abolmaesumi, and A. Madabhushi,    “Detection of prostate cancer from whole-mount histology images    using markov random fields,” in Workshop on Microscopic Image    Analysis with Applications in Biology (in conjunction with    MICCAI), 2008. [Online]. Available:    http://www.miaab.org/miaab-2008-papers/28-miaab-2008-paper-22.pdf-   [32] S. Hojjatoleslami and J. Kittler, “Region growing: a new    approach,” IEEE Trans. on Image Processing, vol. 7, no. 7, pp.    1079-1084, 1998.-   [33] S. Geman and D. Geman, “Stochastic relaxation, gibbs    distributions and the bayesian restoration of images,” IEEE    Transactions on Pattern Analysis and Machine Intelligence, vol. 6,    no. 6, pp. 721-741, November 1984.-   [34] J. Besag, “Statistical analysis of dirty pictures,” Journal of    Royal Statistic Society, vol. B, no. 68, pp. 259-302, 1986.-   [35] S. Doyle, M. Hwang, K. Shah, A. Madabhushi, M. Feldman, and J.    Tomaszeweski, “Automated grading of prostate cancer using    architectural and textural image features,” in Proc. 4th IEEE    International Symposium on Biomedical Imaging: From Nano to Macro,    2007, pp. 1284-1287.-   [36] R. E. Bellman, Adaptive Control Processes: A Guided Tour.    Princeton University Press, 1961.-   [37] J. Shi and J. Malik, “Normalized cuts and image segmentation,”    IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 22,    no. 8, pp. 888-905, August 2000.-   [38] G. Lee, C. Rodriguez, and A. Madabhushi, “Investigating the    efficacy of nonlinear dimensionality reduction system and methods in    classifying gene and protein expression studies,” IEEE/ACM    Transactions on Computational Biology and Bioinformatics, vol. 5,    no. 3, pp. 368-384, 2008.-   [39] C. Cortes and V. Vapnik, “Support-vector networks,” Machine    Learning, vol. 20, no. 3, pp. 273-297, 1995.-   [40] M. W. Schwarz, W. B. Cowan, and J. C. Beatty, “An experimental    comparison of rgb, yiq, lab, hsv, and opponent color models,” ACM    Trans. on Graphics, vol. 6, no. 2, pp. 123-158, 1987.-   [41] R. Duda, P. Hart, and D. Stork, Pattern Classification. John    Wiley & Sons, 2001.-   [42] Papoulis, Probability, Random Variables, and Stochastic    Processes, None, Ed. McGraw Hill, Inc., 1965.-   [43] J. Sudbo, R. Marcelpoil, and A. Reith, “New algorithms based on    the voronoi diagram applied in a pilot study on normal mucosa and    carcinomas.” Anal Cell Pathol, vol. 21, no. 2, pp. 71-86, 2000.-   [44] D. Huttenlocher, G. Klanderman, and W. Rucklidge, “Comparing    images using the hausdorff distance,” IEEE Trans. on Pattern    Analysis and Machine Intelligence, vol. 15, no. 9, pp. 850-863,    1993.-   [45] El-Naga, Y. Yang, M. N. Wernick, N. P. Galatsanos, and R. M.    Nishikawa, “A support vector machine approach for detection of    microcalcifications.” IEEE Trans Med Imaging, vol. 21, no. 12, pp.    1552-1563, December 2002. [Online]. Available:    http://dx.doi.org/10.1109/TMI.2002.806569-   [46] M. Varma and A. Zisserman, “A statistical approach to texture    classification from single images,” Int J Comput Vision, vol. 62,    no. 1-2, pp. 61-81, 2005.-   [47] F. Schnorrenberg, C. S. Pattichis, C. N. Schizas, and K.    Kyriacou, “Content-based retrieval of breast cancer biopsy slides.”    Technol Health Care, vol. 8, no. 5, pp. 291-297, 2000.-   [48] O. Tuzel, L. Yang, P. Meer, and D. J. Foran, “Classification of    hematologic malignancies using texton signatures,” Pattern Analysis    & Applications, vol. 10, no. 4, pp. 277-290, 2007.-   [49] J. Naik, S. Doyle, A. Basavanhally, S. Ganesan, M. Feldman, J.    Tomaszewski, and A. Madabhushi, “A boosted distance metric:    Application to content based image retrieval and classification of    digitized histopathology,” in SPIE Medical Imaging, vol. 7260, 2009.    [Online]. Available: http://dx.doi.org/10.1117/12.813931

1. An image-based risk score predictor method for measuring cancerextent to evaluate disease outcome in cancer patients using digitizedhistopathology comprising: i. scanning stained histopathology slidesinto a computer using a high resolution whole slide scanner; ii.detecting cancer nuclei using an Expectation-Maximization basedalgorithm; iii. constructing the Delaunay Triangulation and MinimumSpanning Tree graphs using the centers of individual detected cancernuclei as vertices; iv. extracting image-derived features describing thearrangement of the cancer nuclei from each image; v. projectinghigh-dimensional image derived feature vector into a reduced 3Dembedding space via Graph Embedding; vi. unwrapping the 3D embeddinginto a 1D scale to define image-based risk scores for poor,intermediate, and good outcome; vii. determining an image-basedrecurrence score to distinguish between low, intermediate, and highcancer grades by uncovering the grade labels of the samples on the 1Dline and their relative locations to predict prognosis.
 2. Animage-based risk score predictor method for measuring extent ofpathological processes to evaluate disease outcome in patients usingdigitized histopathology comprising: i. scanning stained histopathologyslides into a computer using a high resolution whole slide scanner; ii.detecting pathological nuclei using an Expectation-Maximization basedalgorithm; iii. constructing the Delaunay Triangulation and MinimumSpanning Tree graphs using the centers of individual detectedpathological nuclei as vertices; iv. extracting image-derived featuresdescribing the arrangement of the pathological nuclei from each image;v. projecting high-dimensional image derived feature vector into areduced 3D embedding space via Graph Embedding; vi. unwrapping the 3Dembedding into a 1D scale to define image-based risk scores for poor,intermediate, and good outcome; vii. determining an image-basedrecurrence score to distinguish between low, intermediate, and highcancer grades by uncovering the grade labels of the samples on the 1Dline and their relative locations to predict prognosis.
 3. Animage-based risk score predictor method for measuring cancer extent toevaluate disease outcome in node-negative, estrogen receptor-positivebreast cancer patients using digitized histopathology comprising: i.scanning stained histopathology slides into a computer using a highresolution whole slide scanner; ii. detecting cancer nuclei using anExpectation-Maximization based algorithm; iii. constructing the DelaunayTriangulation and Minimum Spanning Tree graphs using the centers ofindividual detected cancer nuclei as vertices; iv. extractingimage-derived features describing the arrangement of the cancer nucleifrom each image; v. projecting high-dimensional image derived featurevector into a reduced 3D embedding space via Graph Embedding; vi.unwrapping the 3D embedding into a 1D scale to define image-based riskscores for poor, intermediate, and good outcome; vii. determining animage-based recurrence score to distinguish between low, intermediate,and high cancer grades by uncovering the grade labels of the samples onthe 1D line and their relative locations to predict prognosis.
 4. Amethod for measuring extent of lymphocytic infiltration to evaluatedisease outcome in breast cancer patients expressing the human epidermalgrowth factor receptor 2 (HER2) comprising: i. scanning stainedhistopathology slides into a computer using a high resolution wholeslide scanner; ii. detecting lymphocyte nuclei using a combination ofregion-growing and Markov Random Field algorithms; iii. constructing aVoronoi Diagram, Delaunay Triangulation, and Minimum Spanning Tree usingthe centers of individual detected lymphocyte nuclei as vertices; iv.extracting image-derived features describing the arrangement of thelymphocyte nuclei from each image; v. projecting high-dimensional imagederived feature vector into a reduced 3D embedding space via GraphEmbedding.